This is an example analysis of a mock dataset using R notebooks. R notebooks allow you to create dynamic documents that include R code and the output and has support for markdown.
library(tidyverse)
library(cowplot)
library(plotly)
library(broom)
We will use the read_csv to read in the raw .csv file that is hosted on my github account. Notice how we used the assignment operator to make the read_csv into a ‘data’ object. When you call it, you will see the how the data looks like. Notice how it’s in the ‘wide’ format.
file <- "https://raw.githubusercontent.com/nguyens7/nguyens7.github.io/master/data/Antibiotics.csv"
data <- read_csv(file)
data
Next we will use the gather function to make the data ‘long’. Here we create two columns Bacteria and Count and transpose the ‘wide’ into ‘long’ format.
data2 <- data %>%
gather(Bacteria, Count, 2:10)
data2
Now that the data is in the long format, we can use the separate command to separate out values from within a column. We’ll create two new columns named Experiment and Organism and you can tell R that you want to separate by “_“.
data3 <- data2 %>%
separate(Bacteria, into=c("Experiment","Organism", sep = "_")) %>%
select(-`_`)
#Make the Treatment and Organism columns as categorical 'factors'
data3$Treatment <- as.factor(data3$Treatment)
data3$Organism <- as.factor(data3$Organism)
data3
Next we’ll use the group_by function to ‘lock in’ specific categories and then use the summarise/summarize function to collapse the ‘long data’ into a smaller data frame. What you can see here is that we will average the 5 technical replicates across all experiments and calculate the standard deviation and standard error of the mean.
data4 <- data3 %>%
group_by(Organism, Treatment, Experiment) %>%
summarise( N = length(Count),
mean = mean(Count),
sd = sd(Count),
se = sd/sqrt(N))
data4
#write_csv(data4,"experimental_summary.csv")
We’ll do the same thing again but average each of the three experimental replicates.
data5 <- data4 %>%
group_by(Organism,Treatment) %>%
summarise( avg_N = length(mean),
average = mean(mean),
avg_sd = sd(mean),
avg_se = avg_sd/sqrt(avg_N))
data5
#write_csv(data5,"final_summary.csv")
We can create a box plot to get an idea of how the data points are distributed. We start by defining what we want for our x-axis and y-axis and we can also map color to Organisms. Nex we will add a boxplot and points to the graph. The last thing we can do is utilize the facet_wrap command to separate out the data, in this case we will separate out by Treatment.
boxplot <- data3 %>%
group_by(Treatment, Organism) %>%
ggplot(aes(x = Organism, y = Count, color = Organism))+
geom_boxplot(colour="black", fill=NA) +
geom_point(position= 'jitter', size=2) +
ylab("\nColony Count\n") + # Y axis label
ggtitle("Effect of Antibiotic on Bacterial Growth") + #title
facet_wrap(~Treatment)
boxplot
We can also use the plotly package to make the graphs interactive.
ggplotly(boxplot)
We can use the ggplot function to create a bar graph and also plot error bars. Notice how we are able to utilize our tidy data to easily plot things.
#Bar graph of experiments
Experimental_Replicates <- data4 %>%
ggplot(aes(x=Organism,y=mean,fill=Treatment))+
geom_col(position="dodge")+
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.4,
size=0.8, colour="black", position=position_dodge(.9)) + #error bars
scale_y_continuous(expand=c(0,0))+ #set bottom of graph
xlab("Organism") + # X axis label
ylab("\nColony Count\n") + # Y axis label
ggtitle("Effect of Antibiotic on Bacterial Growth")+ #title
facet_wrap(~Experiment)
Experimental_Replicates
###Final plot
#Final graph
Final_plot <- data5 %>%
ggplot(aes(x=Organism,y=average,fill=Treatment))+
geom_col(position="dodge")+
geom_errorbar(aes(ymin=average-avg_se, ymax=average+avg_se), width=.4,
size=0.8, colour="black", position=position_dodge(.9)) + #error bars
scale_y_continuous(expand=c(0,0))+ #set bottom of graph
xlab("Organism") + # X axis label
ylab("\nColony Count\n") + # Y axis label
ggtitle("Effect of Antibiotic on Bacterial Growth")
Final_plot
R is really useful for performing statistical analysis on your data. For this mock dataset, I chose to run a parametric shapiro test. I just use the shapiro.test command and specify the ‘Count’ column from data3 dataframe. Next I use the tidy function to create a neat dataframe of the statistics.
#Parametric Test
shapiro <- shapiro.test(data3$Count)
tidy(shapiro)
Fail to reject Ho -> data is normal
Next we can peform an ANOVA which follows a simple formula. Use the aov command and then specify the numerical value then ‘~’ and then your other variables that you want to compare, in our case we want to compare the mean between Organism and Treatment.
ANOVA <- aov(mean~(Organism*Treatment),data=data4)
tidy(ANOVA)
We can then perform a post hoc test by using the TukeyHSD analysis.
#Tukey HSD
HSD <- TukeyHSD(ANOVA)
tukey <- tidy(HSD)
tukey
We can sort this by p values that are less than 0.05 and arrange them in the correct order.
#Aggregate significant results
sig.tukey <- tukey %>%
filter(adj.p.value<0.05) %>%
arrange(adj.p.value)
sig.tukey